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Abstract 

Compressive Sensing (CS) is a new technique for the efficient acquisition of signals, images, and 
other data that have a sparse representation in some basis, frame, or dictionary. By sparse we mean 
that the A^-dimensional basis representation has just K <^ N significant coefficients; in this case, 
the CS theory maintains that just M = O {K log N) random linear signal measurements will both 
preserve all of the signal information and enable robust signal reconstruction in polynomial time. In this 
paper, we extend the CS theory to pulse stream data, which correspond to ^-sparse signals/images that 
are convolved with an unknown F-sparse pulse shape. Ignoring their convolutional structure, a pulse 
stream signal \s K = SF sparse. Such signals figure prominently in a number of applications, from 
neuroscience to astronomy. Our specific contributions are threefold. First, we propose a pulse stream 
signal model and show that it is equivalent to an infinite union of subspaces. Second, we derive a lower 
bound on the number of measurements M required to preserve the essential information present in pulse 
streams. The bound is linear in the total number of degrees of freedom S + F, which is significantly 
smaller than the naive bound based on the total signal sparsity K = SF. Third, we develop an efficient 
signal recovery algorithm that infers both the shape of the impulse response as well as the locations and 
amplitudes of the pulses. The algorithm alternatively estimates the pulse locations and the pulse shape 
in a manner reminiscent of classical deconvolution algorithms. Numerical experiments on synthetic and 
real data demonstrate the advantages of our approach over standard CS. 
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I. Introduction 

Digital signal processing systems face two parallel challenges. On the one hand, with ubiq- 
uitous computing power, memory and communication bandwidth, the pressure is on acquisition 
devices, such as analog-to-digital converters and digital cameras, to capture signals at ever 
increasing sampling rates. To date, signal acquisition has been governed by the Shannon/Nyquist 
sampling theorem, which states that all the information contained in a signal is preserved if it 
is uniformly sampled at a rate twice as fast as the bandwidth of its Fourier transform. On the 
other hand, to counter the resulting deluge of Nyquist-rate samples, DSP systems must utilize 
efficient compression schemes that preserve the essential information contained in the signals 
of interest. Transform compression of a discrete-time signal x e involves representing the 
signal in a suitable basis expansion x — ^a, with an x basis matrix, and storing only 
the K largest basis coefficients. The number of large coefficients in a is known as the sparsity 
K of the signal in the basis ^. For many classes of interesting signals, K <^ N, and hence 
efficient signal compression can be achieved. 

An intriguing question can thus be asked: can a system simultaneously attain the twin goals of 
signal acquisition and compression? Surprisingly, the answer in many cases is yes. This question 
forms the core of the burgeoning field of Compressive Sensing (CS) [3,4]. A prototypical CS 
system works as follows: a signal x of length N is sampled by measuring its inner products with 
M <^ N vectors; the output of the sampling system is thus given by the vector y — ^x — ^^a, 
where $ e j^mxat ^ non-invertible matrix. The CS theory states that with high probability, x 
can be exactly reconstructed from y provided that (i) the elements of $ are chosen randomly from 
subgaussian probability distributions, and (ii) the number of samples M is O {Klog{N/K)). 
Further, this recovery can be carried out in polynomial time using efficient greedy approaches 
or optimization based methods [5, 6]. 

For some applications, there exist more restrictive signal models than simple sparsity that 
encode various types of inter-dependencies among the locations of the nonzero signal compo- 
nents. Recent work has led to the development of CS theory and algorithms that are based on 
structured sparsity models that are equivalent to a finite union of subspaces [7, 8]. By exploiting 
the dependencies present among the nonzero coefficients, M can be significantly reduced; for 
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certain structured sparsity models, with high probabiUty the number of measurements M required 
for exact recovery is merely O [K] (without the additional logarithmic dependence on the signal 

length A^). 

Despite the utility of sparsity models, in many real-world sensing applications the assumption 
of sparsity itself is an oversimplification. For example, a electrophysiological recording of a 
neuron is often approximated as a series of spikes but can be better modeled as a series of 
more elongated pulses, the pulse shape being characteristic to the particular neuron. As another 
example, a high-resolution image of the night sky consists of a field of points (corresponding 
to the locations of the stars) convolved with the point spread function of the imaging device. 
Such signals can be modeled as an ^-sparse spike stream that have been convolved with an 
unknown F-sparse impulse response so that the resulting overall sparsity K — SF. We call 
such a signal a pulse streams. For the compressive sensing and recovery of a pulse stream, the 
number of measurements M would incur a corresponding multiplicative increase by a factor of 
F when compared to sensing merely the underlying spike streams; this can be prohibitive in 
some situations. Thus, it is essential to develop a CS framework that can handle not just sparse 
signals but also more general pulse streams. 

In this paper, we take some initial steps towards such a CS pulse stream framework. First, 
we propose a deterministic signal model for pulse streams. We show that our proposed model 
is equivalent to an infinite union of subspaces. Second, as our main theoretical contribution, we 
derive a bound on the number of random linear measurements M required to preserve the essen- 
tial information contained in such signals. The proof relies on the particular high-dimensional 
geometry exhibited by the proposed model. Our derivation shows that M = O {{S + F) log A^); 
i.e., M is proportional to the number of degrees of freedom of the signal S + F but sublinear 
in the total sparsity K — SF. Third, we develop algorithms to recover signals from our model 
from M measurements. Under certain additional restrictions on the signals of interest, one of 
the algorithms provably recovers both the spike stream and the impulse response. We analyze 
its convergence, computational complexity, and robustness to variations in the pulse shape. 
Numerical experiments on real and synthetic data sets demonstrate the benefits of the approach. 
As demonstrated in Figure 1 , we obtain significant gains over conventional CS recovery methods, 
particularly in terms of reducing the number of measurements required for recovery. 
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Fig. 1. (a) Test signal of length N = 1024 obtained by convolving a spike stream with S = 6 with an impulse response 
of length F ~ 11, so that the total signal sparsity K = SF — 66. (b) Profile of one pulse . Signal reconstruction from 
M — 100 random Gaussian measurements performed using (c) a state-of-the-art CS recovery algorithm (CoSaMP [9], 
MSE = 13.42), and (d) our proposed Algorithm 2 (MSE = 0.0028). 



This paper is organized as follows. In Section II, we review the rudiments of standard and 
structured sparsity-based CS. In Section III, we propose a deterministic signal model for pulse 
streams and discuss its geometric properties. In Section IV, we derive bounds on the number of 
random measurements required to sample signals belonging to our proposed model. In Section V, 
we develop an algorithm for stable signal recovery and analyze its convergence and robustness 
to model mismatch. Numerical results are presented in Section VI, followed by a concluding 
discussion in Section VII. 
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II. Background on Compressive Sensing 

A. Sparse signal models 

A signal x e is K -sparse in the orthonormal basis ^ e M^^^ if the corresponding basis 
representation a — contains no more than K nonzero elements. Without loss of generality, 
we assume the sparsity basis ^ to be the identity matrix for R^. The locations of the nonzeros 
of X can additionally be encoded by a binary vector of length with a 1 indicating a nonzero; 

this vector a{x) is called the support of x. Denote the set of all fC-sparse signals in R^ as E^- 
Geometrically, T^k can be identified as the union of (^) subspaces of R^, with each subspace 
being the linear span of exactly K canonical unit vectors of R^. For a general x e R^, we 
define its best X-sparse approximation xk as 

Xk = arg min \\x — ulU- 

Many signals of interest exhibit more complex dependencies in terms of their nonzero values 
and locations. For instance, signals that permit only a small number of admissible support 
configurations can be modeled by a restricted union of subspaces, consisting only of Lk canonical 
subspaces (so that Lk is typically much smaller than (^)). Thus, if E = {ai, (T2, . . . , ctl^} 
denotes the restricted set of admissible supports, then a structured sparsity model [7] is the set 

Mk:= {x:a{x) eT.}. (1) 

B. Signal acquisition via nonadaptive linear measurements 

Suppose that instead of collecting all the coefficients of a vector x e R^, we merely record 
M inner products (measurements) of x with M < N pre-selected vectors; this can be represented 
in terms of a linear transformation y = ^x,^ e R^^^. $ is called the sampling matrix; it is at 
most rank-M and hence has a nontrivial nuUspace. The central result in Compressive Sensing 
(CS) is that despite the non-invertible nature of $, if x is sparse, then it can be exactly recovered 
from y if $ satisfies a condition known as the restricted isometry property (RIP): 

Definition 1: [10] An M x A/^ matrix $ has the X-RIP with constant 5k if, for all x e E^, 

{1-5k)\\x\\1<\\^x\\1<{1 + 5k)\\x\\1. (2) 
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A matrix $ with the X-RIP essentially ensures a stable embedding of the set of all X-sparse 
signals T,k into a subspace of dimension M. The RIP requires $ to leave the norm of every 
sparse signal approximately invariant; also, $ must necessarily not contain any sparse vectors in 
its nuUspace. At first glance, it is unclear if a matrix <l> that satisfies the RIP should even exist 
if M < N; indeed, deterministic design of a sampling matrix having the RIP is an NP-complete 
problem. Nevertheless, it has been shown [10] that provided M > O {K\og{N/K)), a matrix $ 
whose elements are i.i.d. samples from a random subgaussian distribution possesses the RIP with 
high probability. Thus, M can be linear in the sparsity of the signal set K and only logarithmic 
in the signal length N. 

An analogous isometry condition holds for structured sparsity models containing Lk canon- 
ical subspaces [7,8, 11]. This is known as the model-based RIP and is defined thus: $ satisfies 
the M.K-RIP if (2) holds for all x e M.k- It can be shown [1 1] that the number of measurements 
M necessary for a subgaussian sampling matrix to have the Al^-RIP with constant 5 and with 
probability 1 — e~* is bounded as 



We can make two inferences from (3). First, the number of measurements M is logarithmic in 
the number of subspaces in the model; thus, signals belonging to a more concise model can be 
sampled using fewer random linear measurements. Second, M is at least linear in the sparsity 
K of the measured signal. 

C. Recovery methods 

Given measurements y = ^x, CS recovery methods aim to find the "true" sparse signal x 
that generated y. One possible method is to seek the sparsest x that generates the measurements 
y, i.e.. 



where the £o "norm" of a vector x' denotes the number of nonzero entries in x\ This method can 
be used to obtain the true solution x provided M > 2K. However, minimizing the £o norm can 
be shown to be NP-complete and is not stable in the presence of noise in the measurements [10]. 




(3) 



X — argmin ||x'||o subject to y = ^x'. 



x' 



(4) 
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If the sampling matrix $ possesses the RIP, then tractable algorithms for CS recovery can 
be developed. These broadly follow two different approaches. The first approach entails solving 
a convex relaxation of (4), e.g., 

X — argmin subject to y = (5) 

x' 

which corresponds to a linear program and hence can be solved in polynomial time. A common 
variant of this formulation includes accounting for noise of bounded magnitude in the measure- 
ments [6]. The second approach entails an iterative, greedy selection of the support a{x) of the 
true solution x. This approach is employed by several algorithms such as orthogonal matching 
pursuit (OMP) [5], compressive sampling matching pursuit (CoSaMP) [9], and iterative hard 
thresholding [12]. 

Both kinds of approaches provide powerful stability guarantees in the presence of noise while 
remaining computationally efficient. Given noisy measurements of any signal x e so that 
1/ = $a; + n, if $ possesses the RIP, then the signal estimate x obtained by these algorithms has 
bounded error: 

C2 „ „ „ „ 

If ~ An ^ C'lllx - xk\\2 + ~ ^k\\i + C'slplb, (6) 

V K 

where xk is the best X-sparse approximation to x and Ci, C2 are constants. Furthermore, with 
a simple modification, algorithms like CoSaMP and iterative hard thresholding can be used to 
reconstruct signals belonging to any structured sparsity model [7]. 

To summarize, at the core of CS lie three key concepts: a signal model exhibiting a particular 
type of low-dimensional geometry in high-dimensional space, a low-rank linear mapping that 
provides a stable embedding of the signal model into a lower dimensional space, and algorithms 
that perform stable, efficient inversion of this mapping. 

III. Signal Models for Pulse Streams 

Our objective is to extend the CS theory and algorithms to pulse stream signals. The conven- 
tional sparse signal model does not take into account the dependencies between the values 
and locations of the nonzeros in such signals. Indeed, these dependencies cannot be precisely 

captured by any structured sparsity model M.k that merely comprises a reduced subset of the 
subspaces in E^. This necessitates richer models that capture the convolutional structure present 
in the nonzero coefficients of pulse streams. 
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A. General model 

Consider the following deterministic model for signals that can modeled by the convolution 
of an S'-sparse spike stream x E with an F-sparse impulse response h E M^. 

Definition 2: Let Ms C. be a union of S'-dimensional canonical subspaces, as defined 
in (1). Similarly, let M.f C be a union of F-dimensional canonical subspaces. Consider the 
set 

Mg^p :^ {z eR^ : z ^ X *h, such that x E Ms and h E Mf}, O) 

where * denotes the circular convolution operator. Then, Mg p is called a pulse stream model. 
We make two immediate observations: 

1) Commutativity: Owing to the commutative property of the convolution operator, an ele- 
ment z in Ms^F can be represented in multiple ways: 

z — x*h — h*x — Hx — Xh, (8) 

where H (respectively, X) is a square circulant matrix with its columns comprising circularly 
shifted versions of the vector h (respectively, x). Therefore, Definition 2 remains unchanged 
if the roles of x and h are reversed. We exploit this property during signal recovery from CS 
measurements in Section V. 

2) Geometry: It is useful to adopt the following geometric point of view: for a fixed h E Mf, 
the set {h* X : x E Ms} forms a finite union of ^'-dimensional subspaces, owing to the fact 
that it is generated by the action of h on Lg canonical subspaces. Denote this set by h{Ms)- 
Then, the pulse stream model in (7) can be written as 

My= U h{Ms). 

heMp 

Thus, our signal model can be interpreted as an infinite union of subspaces} Note that (3) cannot 
be applied in this case since it only considers finite unions of subspaces. However, let K = SF 
denote the maximum sparsity of the signals in Definition 2. Then, it is clear that the set M^p 
is a very small subset of Sx, the set of all ^F-sparse signals. We exploit this property while 
proving our main sampling results in Section IV. 

'a general theory for sampling signals from infinite unions of subspaces has been introduced in [13]. 
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Note that the exact definition of convolution operator changes depending on the domain of 
the signals of interest. For one-dimensional (ID) time domain signals of length A^, the square 
matrix H is formed by all N circular shifts of the vector h; for 2D images of size N pixels, H 
is formed by all 2D circular shifts of h, and so forth. 

B. Special case: Disjoint pulses 

The model proposed in Definition 2 is general and applicable even to signals in which 
successive pulses overlap with each other. In Section IV we develop a lower bound on the 

number of samples required to preserve the essential information contained in an arbitrary pulse 
stream. However, feasible recovery of such general pulse streams from CS measurements is 
rather difficult; we examine this in detail in Section V. Therefore, we will also consider a more 
restrictive model where the pulses are assumed to not overlap. 

For concreteness, consider ID time domain signals as specified by (8). Note that H and x 
need not be unique for a given z; any ordered pair {aH, x/a) satisfies (8), and so does (iJ', x'), 
where H' is generated by a circularly shifted version of /i by a time delay +r and x' is a 
circularly shifted version of x by — r. To eliminate these ambiguities, we make the following 
two assumptions: 

1) the impulse response h is concentrated, i.e., all the nonzero coefficients of h are contigu- 
ously located in its first F indices. Thus, the structured sparsity model M.f for the vector 
h consists of the lone subspace spanned by the first F canonical unit vectors. 

2) the spikes are sufficiently separated in time. In particular, any two consecutive spikes in the 
vector X are separated at least by A locations, where A > F. A structured sparsity model 
for such time-domain signals with sufficiently separated nonzeros has been introduced 
in [14]. 

The notion of disjoint pulses can be immediately generalized to signals defined over domains 
of arbitrary dimension. Consider ^'-sparse spike streams x defined over a domain of dimension 
n. Suppose that at most one spike in x can occur in a hypercube in M" with side A. This defines 
a special structured sparsity model for the spike streams of interest; denote this model as A4f . 
Further, let the F nonzero coefficients in h be concentrated within a hypercube centered at the 
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domain origin whose side length is no greater than A. Then, a deterministic model for sums of 
non-overlapping pulses of arbitrary dimension can be proposed as follows. 

Definition 3: Let be the structured sparsity model for spike streams as defined above. 
Let M.F be the subspace of concentrated impulse responses of sparsity F. Define the set 

Mis, F, A) = {z e : z ^ X * h, such that x e Mg and heMp}- (9) 

Then, M.{S, F, A) is called the disjoint pulse stream model. 

This model eliminates possible ambiguities that arise due to the shift-invariant nature of 
convolution; i.e., the locations of the nonzero spikes that generate a disjoint pulse stream are 
uniquely defined. This property proves to be essential in developing and analyzing a feasible 
method for signal recovery (Section V). See Figure 1(a) for an example stream of disjoint pulses 
in ID. 

IV. Sampling Theorems for Pulse Streams 

Pulse streams can be modeled as an infinite union of low-dimensional subspaces. The next 
ingredient in the development of a CS framework for such signals is a bound on the number of 
linear samples required to preserve the essential information of this signal set. 

A. General pulse streams 

We derive a sampling theorem for signals belonging to the model M-s^f proposed in Def- 
inition 2. Suppose that K — SF. As mentioned above, M^p is a subset of the set of all 
X-sparse signals E^. On the other hand, only a small fraction of all X-sparse signals can be 
written as the convolution of an S'-sparse spike stream with an F-sparse impulse response. Thus, 
intuition suggests that we should be able to compressively sample signals from this set using 
fewer random linear measurements than that required for the set of all K -sparse signals. The 
following theorem makes this precise. 

Theorem 1: Suppose Mg^p is the pulse stream model from Definition 2. Let t > 0. Choose 
aa M X N i.i.d. subgaussian matrix $ with 

M>o(U{S + F) In (l) + logiLsLp) + t]]. (10) 
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Then, $ satisfies the following property with probability at least 1 — e *: for every pair zi,Z2 e 
My, 

(1 - 5)11^1 - z4l < - ^Z2\\l < (1 + 5)11^1 - Z2\\l (11) 

The proof of this theorem is presented in Appendix A. An important consequence of the 
theorem is that, by definition, A^^ is a subset of the set of all S'-dimensional canonical subspaces. 
In particular. 

Similarly, Lp < (^^) . Therefore, the logarithmic term in the expression for M in (10) scales 
as: 

logiLsLp) < S + S\og{N/S) + F + F\og{N/F) < 2{S + F) logN (13) 

Thus, (10) indicates that the number of measurements M required for the sampling of signals in 
A^l^ is proportional to {S + F). Therefore, M is sublinear in the total sparsity of the signals 
K — SF. In contrast, conventional structured sparsity models would require at least 2K — 2SF 
linear measurements to ensure a stable embedding of the signal set [11]. In addition, the number 
of degrees of freedom of each signal can be considered to be O {S + F), corresponding to the 
positions and locations of the coefficients of the sparse signal and impulse response. Therefore, 
the bound in Theorem 1 is essentially optimal for the signal class M^^p. 

B. Special case: Disjoint pulse streams 

Theorem 1 is valid for signals belonging to the general model M-s^f- the case of disjoint 
pulse streams, we can derive a more stringent lower bound. By definition, the F nonzero 
coefficients of h are concentrated in a hypercube around the domain origin. Therefore, h lies 

in a lone subspace spanned by F basis vectors of M^, and hence Lp = 1. Further, a simple 
modification of Theorem 1 of [14] states that the number of subspaces in the structured sparsity 
model M-s is given by 

fN-SA + S-l\ 
Ls-i . (14) 
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Thus, for the disjoint pulse stream model A4{S, F, A), we obtain the following easy corollary 
to Theorem 1. 

Corollary 1: If t > and 

M>o(^^(^{S + F) In + S log{N/S - A) + ^ , (15) 

then m M X N i.i.d. gaussian matrix $ will satisfy (11) with probability at least 1 — e~* for 
any pair of signals zi, Z2 belonging to the M.{S, F, A) model. 

Note that the parameter A can be at most N/S, since S spikes must be packed into N 
coefficient locations with at least A locations separating any pair of spikes. A higher value of 
A implies that the model M-s admits a smaller number of configurations; thus, (15) implies 
that fewer measurements are needed to sample pulse streams in which the pulses are widely 
separated. 

V. Recovery of Pulse Streams 

The final ingredient in our extended CS framework for pulse streams consists of new algo- 
rithms for the stable recovery of the signals of interest from compressive measurements. This 
problem can be stated as follows. Suppose z e M.s,f- If w are given the noisy measurements 

y = ^z + n = ^Hx + n = ^Xh + n, 

then we aim to reconstruct z from y. The main challenge stems from the fact that both x 
(respectively, X) and h (respectively, H) are unknown and have to be simultaneously inferred. 

This problem is similar to performing sparse approximation with incomplete knowledge of 
the dictionary in which the target vector (either x or h) is sparse. This problem has received 
some interest in the literature [15-17]; the common approach has been to first assume that a 
training set of vectors {xi} exists for a fixed impulse response h, then infer the coefficients of h 
using a sparse learning algorithm (such as LASSO [18] or basis pursuit [6]), and then solve for 
the coefficients {xi}. In the absence of training data, we must infer both the spike locations and 
the impulse response coefficients. Therefore, our task is also similar to blind deconvolution [19]; 
the main differences are that we are only given access to the random linear measurements y as 
opposed to the Nyquist rate samples z, and that our primary aim is to reconstruct z as faithfully 
as possible as opposed to merely reconstructing x. 
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Our general approach will be to fix an estimate of h, obtain the "best possible" estimate of 
X, update our estimate of h, and iterate. This is commonly known as alternating minimization 
(AM) and has been shown to be suitable for blind deconvolution settings [20]. As demonstrated 
below in the proof of Theorem 2, we require that the best possible estimate of the spike stream 
X and the impulse response h at each iteration are unique. For this reason, we will assume that 
our target signal z belongs to the disjoint pulse stream model M.{S, F, A). 

A. Alternating minimization with exhaustive search 

Consider z E M.{S, F, A), so that z = x*h. This implies that the spikes in x are separated by 
a minimum separation distance A and that the impulse response h is concentrated. Suppose first 
that we are given noiseless CS measurements y — ^z. We fix a candidate support configuration 
a for the spike stream (so that a contains S nonzeros.) Then, we form the circulant matrix H 
from all possible shifts of the current estimate of the impulse response h (denote this operation 
as if = C{h)). Further, we calculate the dictionary for the spike stream x, and select 
the submatrix formed by the columns indexed by the assumed spike locations a (denote this 
submatrix as (^i/)^). This transforms our problem into an overdetermined system, which can 
be solved using least-squares. In summary, we use a simple matrix pseudoinverse to obtain an 
estimate for x: 

x^{^H)ly. 

This gives us an estimate of the spike coefficients x for the assumed support configuration a. 
We now exploit the commutativity of the convolution operator *. We form the circulant matrix 
X, form the dictionary for the impulse response and select the submatrix {^X) / formed 
by its first F columns. Then, we solve a least-squares problem to obtain an estimate h for the 
impulse response coefficients: 

h = {^X)\y. 

Then, we form our signal estimate S — x * h. The above two-step process is iterated until a 
suitable halting criterion (e.g., convergence in norm for the estimated signal z). This process is 
akin to the Richardson-Lucy algorithm for blind deconvolution [21]. 
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Algorithm 1 Alternating minimization with exhaustive search 



Inputs: Sampling matrix measurements y — ^x, model parameters A, S, F, threshold e 
Output: z & M{S, F, A) such that y - is small 

x = Q,h = 0, . . . , 0) //F; i = {initialize} 
for a e Mi do 

1. H — C{h), $/i = {^H)cr {form dictionary for spike stream} 

2. X ■<— ^\y {update spike stream estimate } 

3. X — C{x), — {^^) f {form dictionary for impulse response} 

4. h ^ly {update impulse response estimate} 

5. ^ <— X * h {form signal estimate} 

if \\y — $^||2 < e {check for energy in residual} 

return ? 
end if 
end for 



The overall reconstruction problem can be solved by repeating this process for every support 
configuration a belonging to the structured sparsity model Aig and picking the solution with 
the smallest norm of the residual r — y — The procedure is detailed in pseudocode form 
in Algorithm 1. Thus, Algorithm 1 consists of performing alternating minimization for a given 
estimate for the support of the underlying spike stream x, and exhaustively searching for the 
best possible support. Under certain conditions on the sampling matrix we can study the 
convergence of Algorithm 1 to the correct answer z, as encapsulated in the following theorem. 

Theorem 2: Let z G Ai{S, F, A) and suppose that $ satisfies (11) with constant 6 for signals 
belonging to Ai{S, F, A). Suppose we observe y = ^z and apply Algorithm 1 to reconstruct 
the signal. Let % be an intermediate estimate of z at iteration i of Algorithm 1. Then: 

1) The norm of the residual \\y — ^Zi\\2 monotonically decreases with iteration count i. 

2) If at any iteration i 

\\y - ^Zih < e, 
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then we are guaranteed that 

11^ - Zih < ce, 

where c depends only on 5. 
The proof of this theorem is provided in Appendix B. The first part of the theorem implies 
that for any given support configuration a. Steps 1 through 4 in Algorithm 1 are guaranteed to 
converge to a generalized fixed point [22]. The second part of the theorem provides a condition 
on the detection of the true support configuration a in the following weak sense: if the energy 
of the residual of the signal estimate is small, then the signal has been accurately reconstructed. 

B. Model mismatch 

In practical situations, we would expect to have minor variations in the shapes of the S 
pulses in the signal z. In this case, z can no longer be expressed as Hx where if is a circulant 
matrix. Let {/ii, h2, . ■ . , hs} be length-F vectors corresponding to each of the S pulses in the 
signal, and let the length-S* spike stream x = {ai,a2, ■ ■ ■ ,as)- Further, let Sj be the circular 
shift operator that maps the pulse shape hi into its corresponding vector in M^. Then, we 
have 

s 

z ^^aiSiihi), (16) 

i=l 

or equivalently, 

z — Hx, 

where H = [§i(/ii), . . . , §5(/is)] is an iV x S' matrix. Assuming that the spikes in x are separated 
by at least A locations, the matrix H is quasi-Toeplitz [23], i.e., the columns of H are circular 
shifts of one another with no more than one nonzero entry in every row. An attractive property 
of quasi-Toeplitz matrices is that there exist analytical expressions for their pseudo-inverses. 
Suppose the measurement matrix $ equals the identity, i.e., we are given Nyquist-rate samples 
of z. Then the matrices and in Step 2 of Algorithm 1 are also quasi-Toeplitz, and hence 

and can be computed in closed form. Thus, given an estimate of the pulse shape Hq, we 
can derive closed-form expressions for the next impulse reponse estimate. 

Additionally, we can obtain an intermediate estimate for the spike stream x. Suppose the 
innermost loop of Algorithm 1 converges to a fixed point estimate h. Since the least-squares 
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equations are homogenous, we may assume that ||/i||2 = 1 without loss of generality. We dub h 
the anchor pulse for the set of pulse shapes {hi, h2, ■ ■ ■ , hs}. The following theorem provides 

an expression relating the anchor pulse to the component pulse shapes. 

Theorem 3: Consider z as defined in (16). Let h be the anchor pulse for the set of pulse 
shapes {hi, h2,..., hs}. Define q = {hi, h) for i = 1,. . .,S. Then, we have that 

h = ^1^. (17) 

The proof of this theorem is provided in Appendix C. Equation (17) implies that the anchor pulse 
his dL weighted linear combination of the component pulses hi, i — 1, . . . , S, with the weights 
defined by the corresponding spike coefficients ai and the inner products q. The anchor pulse 
remains unchanged if the spike coefficient vector x is multiplied by any constant C. Therefore, 
the anchor pulse can be viewed as a scale-invariant average of the component pulse shapes. 

Theorem 3 applies to Nyquist-rate samples of the signal z. In the case of low-rate CS 
measurements y = ^z, the convergence analysis of Algorithm 1 for the general case of S 
different pulse shapes becomes more delicate. If $ possesses the RIP only for z e A4{S, F, A), 
then it could be that two different pulse streams zi, Z2 (each with varying shapes across pulses) 
are mapped by $ to the same vector in M^, i.e., ^zi — $2^2; thus, the unique mapping argument 
employed in the proof of Theorem 2 cannot be applied in this case. One way to analyze this case 
is to recognize that by allowing arbitrary pulse shapes {hi, h2, ■ ■ ■ , hs}, our space of signals of 
interest is equivalent to a special structured sparsity model that consists of all i^T-sparse signals 
whose non-zeros are arranged in S blocks of size F and the starting locations of consecutive 
blocks are separated by at least A locations. As discussed in Section n, stable CS reconstruction 
for signals from this model requires at least M — 2SF — 2K measurements; thus. Algorithm 1 
converges in the general case given that M is proportional to K. Thus, in the case of arbitrary 
pulse shapes, the number of measurements required by Algorithm 1 is on the same order as the 
number of measurements required for conventional structured sparsity-based CS recovery. 

C. Iterative support estimation 

Algorithm 1 involves iteratively solving a combinatorial number of estimation problems. 
This becomes infeasible for even moderate values of N. A simpler method can be proposed as 
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follows: instead of cycling through every possible support configuration ai for the spike stream 
X, we instead retain an estimate of the support configuration, based on the current estimates 
of the spike stream x and impulse response h, and update this estimate with each iteration. In 
order to ensure that the support estimate belongs to Aig, we leverage a special CS recovery 
algorithm for signals belonging to M.g that is based on CoSaMP [9]. We provide an outline of 
the algorithm here for completeness; see [14] for details. 

At each iteration, given an estimate of the spike coefficients x, we need to solve for the best 
A^f -approximation to x. Let x — {xi,X2, ■ ■ ■ , Xn)"^. Given any binary vector s — {si, S2, ■ ■ ■ , sat)^ 
of length N, let: 

X\s ■= {SlXi, S2X2, SnXn), 

so that is the portion of the signal x lying within the support s. Our goal is to solve for 
the choice of support s so that belongs to and ||a; — x\s\\2 is minimized. The following 
constraints on the support vector s follow from the definition of A^f : 

S1 + S2 + ... + SN < S, (18) 
Sj + Sj+i... + Sj+A-i < 1, for j = 1, . . . ,7V, (19) 

where the subscripts are computed modulo N. The first inequality (18) specifies that the solution 
contains at most S nonzeros; the other N inequalities (19) specify that there is at most one spike 
within any block of A consecutive coefficients in the solution. 

It can be shown that minimizing ||a; — x\s\\2 is equivalent to maximizing c^s where c = 
{xl,xl, . . . , xlf), i.e., maximizing the portion of the energy of x that lies within s. Define W e 
]g(JV+i)xAr ^ ijinary indicator matrix that captures the left hand side of the inequality constraints 
(18) and (19). Next, define u e R^+^ = (5", 1, 1, . . . , 1); this represents the right hand side of the 
constraints (18) and (19). Thus, we can represent (18) and (19) by the following binary integer 
program: 

s* = argmin c^s, subject to Ws < u. 

Next, we relax the integer constraints on s to obtain a computationally tractable linear program. 
Denote this linear program by D( ). In [14], it is shown that the solutions to the integer program 
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Algorithm 2 Iterative support estimation 



Inputs: Sampling matrix measurements y — ^ 
Output: Ai{S, F, A)-sparse approximation z' to t 
Initialize x = , h = 0, . . . , 0), i = 
while halting criterion false do 

1. i ^ i + 1 

2. z X * h 

{estimate spike locations and amplitudes} 

3. H = C(h),(^h = ^H 

4. e ^ - 

5. CO ^ (r{B{e)) 

6. (7 •<— a; U (7(xj_i) 

7. X\a ^ i^hVaV, X\^C = 

S.x^ B){x) 

{estimate impulse response} 

9. X = C(x),$, = (<l>X)y 

10. h ^ ^ly 
end while 
return F ■<— x * /i 



+ n, model parameters A, S, F. 
e signal z 



{current pulse stream estimate} 

{form dictionary for spike stream} 

(residual) 

{obtain model-approximation of residual) 
{merge supports} 
{update spike stream estimate} 
{prune spike stream estimate} 

{form dictionary for impulse response) 
{update impulse response estimate) 



and its relaxed version are identical. Thus, we have a computationally feasible method to obtain 
an estimate of the support of the best A^f -approximation to x. 

Once an updated support estimate has been obtained, we repeat Steps 2, 3 and 4 in Algo- 
rithm 1 to solve for the spike stream x and impulse h. This process is iterated until a suitable 
halting criterion (e.g., convergence in norm for the estimated pulse stream £) The overall 
algorithm can be viewed as an iterative sparse approximation procedure for the model 
that continually updates its estimate of the sparsifying dictionary. The procedure is detailed in 
pseudocode form in Algorithm 2. 



18 



D. Stability and convergence 

Like many other algorithms for blind deconvolution, the analysis of Algorithm 2 is not 
straightforward. The dictionaries $X and $if are only approximately known at any intermediate 
iteration, and hence the proof techniques employed for the analysis for CoSaMP do not apply. 
In principle, given access to a sufficient number of measurements, we may expect similar 
convergence behavior for Algorithm 2 as Algorithm 1. Empirically, Algorithm 2 can be shown 
to be stable to small amounts of noise in the signal as well as in the CS measurements and 
to minor variations in the pulse shape. We demonstrate this with the help of various numerical 
experiments in Section VI. 

E. Computational complexity 

The primary runtime cost of Algorithm 2 is incurred in solving the linear program D(-). For 
a length- A/^ signal, the computational complexity of solving a linear program is known to be 
O {N^-^'). The total computational cost also scales linearly in the number of measurements M 
and the number of iterations T of the outer loop executed until convergence; thus, overall the 
algorithm runs in polynomial time. 

VI. Numerical experiments 

We now present a number of results that validate the utility of our proposed theory and meth- 
ods. All numerical experiments reported in this section have been performed using Algorithm 2 
for recovery of disjoint pulse streams. 

A. Synthetic ID pulse streams 

Figure 1 demonstrates the considerable advantages that Algorithm 2 can offer in terms of the 
number of compressive measurements required for reliable reconstruction. The test signal was 
generated by choosing S — S spikes with random amplitudes and locations and convolving this 
spike stream with a randomly chosen impulse response of length F = 11. The overall sparsity 
of the signal K = SF = 88; thus, standard sparsity-based CS algorithms would require at least 
2K = 176 measurements. Our approach (Algorithm 2) returns an accurate estimate of both the 
spike stream as well as the impulse response using merely M — 90 measurements. 
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Fig. 2. Normalized reconstruction MSE vs. Mj K for diiferent reconstruction algorithms averaged over 200 sample 
trials. Signal parameters: N ~ 1024, = 8, F = 11. Algorithm 2 outperforms standard and structured sparsity -based 
methods, particularly when Mj K is small. 



Figure 2 displays the averaged results of a Monte Carlo simulation of Algorithm 2 over 
200 trials. Each trial was conducted by generating a sample signal belonging to A^(S', F, A), 
computing M linear random Gaussian measurements, reconstructing with different algorithms, 
and recording the magnitude of the recovery error for different values of M / K. It is clear from 
the figure that Algorithm 2 outperforms both conventional CS recovery (CoSaMP [9]) with 
target sparsity K = SF as well as block-based reconstruction [7] with knowledge of the size 
and number of blocks (respectively F and S). In fact, our algorithm performs nearly as well as 
the "oracle decoder" that possesses perfect prior knowledge of the impulse response coefficients 
and aims to solve only for the spike stream. 

We show that Algorithm 2 is stable to small amounts of noise in the signal and the measure- 
ments. In Figure 3, we generate a length N = 1024 signal from a disjoint pulse stream model 
with S = 9 and F = 11; add a small amount of Gaussian noise (SNR = 13.25dB) to all its 
components, compute M = 150 noisy linear measurements, and reconstruct using Algorithm 2. 
The reconstructed signal is clearly a good approximation of the original signal. 

B. Neuronal signals 

We test Algorithm 2 on a real-world neuronal recording. Figure 4(a) shows the temporal 
electrochemical spiking potential of a single neuron. The shape of the pulses is characteristic of 
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(a) 



(b) 



Fig. 3. (a) Synthetic noisy signal (SNR = 13.25dB). (b) Recovery from M — 150 random measurements using 
Algorithm 2. 
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Fig. 4. CS recovery of a real-world neuronal signal, (a) Original recording, (b) Recovered signal using M = 150 
random measurements, (c) Estimated anchor pulse shape (F — 11). 



the neuron and should ideally be constant across different pulses. However, there exist minor 
fluctuations in the amplitudes, locations and profiles of the pulses. Despite the apparent model 
mismatch, our algorithm recovers a good approximation to the original signal (Figure 4(b)) as 
well as an estimate of the anchor pulse shape (Figure 4(c)). 

C. Synthetic 2D pulse streams 

Theorem 1 and Algorithm 2 can easily be extended to higher dimensional signals. For 
instance, suppose that the signals of interest are 2D images that can be modeled by a sparse 
sum of disjoint 2D pulses. We test Algorithm 2 on a synthetic image (Figure 5(a)) of size 

= 64 X 64 = 4096 that comprises S = 7 spikes blurred by an unknown 2D impulse response 



21 



(a) (b) (c) 

Fig. 5. Example CS recovery of a sum of 2D pulses, (a) Synthetic test image: N = 4096, S = 7, F = 25. Images are 
recovered from M = 290 random Gaussian measurements using (b) CoSaMP (MSE = 16.95), and (c) Algorithm 2 
(MSB = 0.07). 

of size F = 5 X 5 = 25, so that the overall sparsity K — SF — 175. We acquire merely M — 290 
random Gaussian measurements (approximately 7% the size of the image N) and reconstruct 
the image using CoSaMP as well as Algorithm 2. We assume that both algorithms possess an 
oracular knowledge of the number of spikes S as well as the size of the impulse response F. 
Figure 5 displays the results of the reconstruction procedures using CoSaMP and Algorithm 2. 
It is evident both perceptually and in terms of the MSE values of the reconstructed images that 
our proposed approach is superior to traditional CS recovery. 

D. Astronomical images 

Finally, we test Algorithm 2 on a real astronomical image. Our test image is a = 64 x 64 
region of a high-resolution image of V838 Monocerotis (a nova-like variable star) captured by 
the Hubble Space Telescope [24] (highlighted by the green square in Figure 6(a)). Note the 
significant variations in the shapes of the three large pulses in the test image (Figure 6(b)). We 
measure this image using M — 330 random measurements and reconstruct using both CoSaMP 
and Algorithm 2. For our reconstruction methods, we assumed an oracular knowledge of the 
signal parameters; we use S = 3, F = 120, K = 360 and A = 20. As indicated by Figure 6, 
conventional CS does not provide useful results with this reduced set of measurements. In 
contrast. Algorithm 2 gives us excellent estimates for the locations of the pulses. Further, our 
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(c) (d) 

Fig. 6. (a) Black-and-white image of V838 Monocerotis, a nova-like star, captured by the Hubble Space Telescope 
on February 8, 2004 [24]. (b) Test image is a zoomed in-version of the region highlighted in green (resolution N = 
64 X 64 = 4096). Reconstruction of test image is performed from M — 330 random Gaussian measurements using 
(c) CoSaMP and (d) Algorithm 2. 



algorithm also provides a circular impulse response estimate that can be viewed as the anchor 
pulse of the three original pulses. 

VII. Discussion and Conclusions 

In this paper, we have introduced and analyzed a new framework for the compressive sampling 
of pulse streams. Our signals of interest are modeled as an infinite union of subspaces which 
exhibits a particular geometric structure. This structure enables us to quantitatively deduce the 
number of random linear measurements needed to sample such signals. We have proposed 
two methods for signal recovery. Our first method (Algorithm 1) is relatively easy to analyze, 
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but suffers from combinatorial complexity. Our second method (Algorithm 2) is a feasible, if 
suboptimal, algorithm and formed the basis for our numerical experiments. While our framework 
is applicable to signals defined over domains of arbitrary dimension, we have illustrated its 
benefits in the context of ID time signals and 2D images. 

There are several avenues for future work. We have discussed sparse signals and images as 
represented in the identity basis; our method can be extended to wavelet-sparse and Fourier-sparse 
signals. While our results are promising, we still do not possess a complete characterization of 
the convergence properties of Algorithm 2 as well as its sensitivity to factors such as noise and 
model mismatch under random projections. Additionally, it is unclear how to deal with situations 
where the pulses in the signal of interest are allowed to overlap. To the best of our knowledge, 
the issue of robust recovery of signals convolved with an unknown arbitrary impulse response 
is an open question even for the case of Nyquist-rate samples. We defer these challenging open 
questions to future research. 

The framework developed in this paper can be related to various existing concepts in the 
literature such as best basis compressive sensing [15], simultaneous sparse approximation and 
dictionary learning [25], and the classical signal processing problem of blind deconvolution [19]. 
Compressive sensing of time-domain pulse streams has been studied by Naini et al. [17]. 
However, in their setting the impulse response is assumed to be known, and hence the CS 
measurement system can be viewed as a modification of random Fourier subsampling. 

Our framework is related to recent results on compressed blind deconvolution by Saligrama 
and Zhao [26]. As opposed to pulse streams, their signals of interest consist of sparse signals 
driven through an all-pole auto-regressive (AR) linear system. They propose an optimization- 
based algorithm for recovery of the signal and impulse response from CS measurements. How- 
ever, their measurement system is tailored to impulse responses corresponding to AR linear 
models; our approach can handle arbitrary impulse responses. Further, our main theoretical result 
indicates that the number of measurements to compressively sample a pulse stream is linear only 
in the number of degrees of freedom of the signal and thus answers an open question (Remark 
3.1) posed by the authors in the affirmative. 

Finally, the main approach in this paper can be related to recent work by Asif et al. [27, 
28], who propose channel coding methods to combat the effect of unknown multipath effects 
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in a communication channel that can be described by a sparse impulse response. Their coding 
strategy follows the one advocated by Candes and Tao [29]: their channel code consists of 
a random matrix $ G ]]ja^x7v ^\Yere M > N, so that the linear mapping y = $x is now 
not undercomplete, but overcomplete. Thus, their observations consist of an unknown sparse 
channel response h convolved with the transmitted signal y and their objective is to reconstruct 
the original signal x. The main aspects of our theoretical analysis could conceivably be modified 
to quantify system performance in this setting. 
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Appendix A 

We prove Theorem 1. By Definition 2, the model Ms^p is generated via the convolution 
operation by the structured sparsity models Ais and Aip- Recall that both structured sparsity 
models are themselves defined in terms of canonical subspaces of and their convolution 
results in a low-dimensional geometrical structure that is best described by an infinite union of 
subspaces. Thus, if x e Ms lies in a particular subspace Q and h e M.f lies in a particular 
subspace A, then every signal z e M.s,f can be identified with at least one infinite union of 
subspaces Uq^a- The overall approach is as follows: we first construct a net of points Q in 
such that 

min \\z — oil < S, 

for all z e Uq^a with ||2;|| = 1 and some constant 5. We then apply the Johnson-Lindenstrauss 
Lemma [30] for stable embedding of point clouds to this finite set of points Q, and extend the 
stable embedding to all possible signals z E Un,A- Finally, we derive our main result through a 
union bound over all possible choices of subspaces and A. 

Consider a fixed vector /i e A. Suppose the coefficients of h are normalized so that \\h\\ — 1. 
By virtue of its circulant nature, the spectral norm of the corresponding matrix \\H\\ < 1. Now, 
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consider a fixed yS-dimensional subspace Q e It is easy to see that 

also forms an ^'-dimensional subspace in R^. Thus, by Lemma 5.1 of [31], we can find a finite 
set of points Qu,h C Vth with cardinality |(5a,/i| < (3/^')'^ ^^ch that 



This is an upper bound on the size of Qn,^; assuming a worst-case scenario, we may list out 
the points in this set so that 

QQ,h = ?2, • • • , ?(3/5')^} = {Hxi, Hx2, . . . , i?a;(3/5/)s}. 

Select any xi e {xi, . . . ,x^3/s')s} and an F-dimensional subspace A E M.f- Form the 
circulant matrix Xf, as above, ||Xi|| < 1. Therefore, 

= = Xih \ h e A} 

forms an F-dimensional subspace. Correspondingly, we can find a set of points Qxi,A C flxi 
with cardinality \Qxi,a\ < (3/5')^ such that 



Thus, we have identified a finite set of points Qq^a in the infinite union of subspaces Uq^a. 
Observe that the cardinality of this set Qn,A = (3/5') ''(3/^')^ • Then, every vector in Un,A with 
magnitude less than 1 lies 'close' to at least one point in Qn,A^ i-C-, Qn,A is a 5"-net for Uq^a- 
Suppose 5 — 25". By the Johnson-Lindenstrauss Lemma, if $ e M^^^ with the elements of $ 
drawn from a random gaussian distribution, then for every pair of vectors zi,Z2 G U^i,a, (H) 
will hold with failure probability 



min \\Hx — q\\ < S', V \\x\\ < 1, x G Q. 



min \\Xih - q\\ < 5' , V < 1, /i G A. 



Using this process, define Qxi,a for Z = 1, 2, . . . , {3/S')^. Then, we have 
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This is for a fixed pair of subspaces (fi,A) e M.s x M.f- There are Lg x L^? such pairs of 
subspaces. Applying a simple union bound over all possible pairs, we obtain the overall failure 
probability as 

P < 5Z ^"'^ - ^^^^ 
(n,A) 

Rearranging terms, we have that for a suitably chosen constant C (that depends on cq) and for 
any i > 0, if 

M>C {\og{LsLF) + {S + F) log 0^ + t 

the failure probability for the sampling bound is smaller than e~*. The theorem follows easily 
from this result. 

Appendix B 

We prove Theorem 2. Let % = Xi * hi be any intermediate estimate of Algorithm 1. Let 
Hi — C{hi). Suppose that our candidate configuration for the support of x is given by the 
sparsity pattern a belonging to the structured sparsity model M.^. Then, if {■)a- indicates the 
submatrix formed by the columns indexed by a, the dictionary for the spike stream is given by 
{^Hi)cr — ^{Hi)cr. By virtue of the least-squares property of the pseudo-inverse operator, the 
subsequent estimate Xi+i according to Step 2 is given by 

Xi+i = argmin \\y - (^{Hi)ax\\l, (20) 

X 

where x belongs to the /^-dimensional subspace defined by the support configuration a. Since 
we are minimizing a convex loss function (squared error) on a subspace in M^, the minimum 
Xi+i is unique. Therefore, we may view Step 2 of the algorithm as a unique-valued infimal 
map f from a given hi e M.f to a particular Xj+i e A^f - Similarly, we may view Step 4 of 
Algorithm 1 as another unique- valued infimal map g from M-s to M.f- Therefore, the overall 
algorithm is a repeated application of the composite map fog. From a well-known result on 
single- valued infimal maps [22,32], the algorithm is strictly monotonic with respect to the loss 
function. Thus, the norm of the residual y — ^% decreases with increasing iteration count i. 




g-co(5/2)M 
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Further, any intermediate estimate % also belongs to the model A4(5', F, A). We know from 
(11) that 

\\y - = \\<^>z - (^>Zi\\l > (1 - 6)\\z - ZiWl 

Therefore, if \\y — < e, \\z — %{] < e/Vl — 

Appendix C 

We prove Theorem 3. Suppose the target signal z is composed of S pulses {hi, . . . , hs}, so 
that 

z — Hx, 

where H = .... S5(/i5)] is an iV x S* matrix and x = (ai, a.^-, ■ ■ ■ , as). Assume we are 

given access to the Nyquist samples z, i,e., $ = Inxn- Suppose the estimate of the impulse 
response at an intermediate iteration is given by h. Let H be the matrix formed by the operator 
C(-) acting on h and let a be the candidate support configuration for the spike stream, so that 
the dictionary $h in this case is given by the submatrix H^^. Note that Ha- is quasi-Toeplitz, 
owing to the assumption that the separation A is at least as great as the impulse response length 
F. Thus, Step 2 of Algorithm 1 can be represented by the least-squares operation 



X 



= Hlz. 



Due to the quasi-Toeplitz nature of H^, the pseudo-inverse = (Hj Ha)~^Hj essentially 
reduces to a scaled version of the identity multiplied by the transpose of H (the scaling factor 
is in fact the squared norm of h). Thus, the spike coefficients are given by 

1 



X — 



-HJHx. 



Simplifying, we obtain the expression for the estimated spike coefficient as 

{hi,h) 



OLi — 0£i 



If h is normalized, we may write ctj = CjCtj, where q = {hi, h). 
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Once the spike coefficients x — {ciai, . . . ,csas) have been estimated, we can form the 
dictionary $2. by considering the quasi-Toephtz matrix X formed by the operation C{x). In the 
same manner as above, an updated estimate of the pulse shape h is given by 

h^X^z = X^z. 

Writing out X and z in terms of (/ii, . . . , hs) and (cti, . . . , as) and simplifying, we obtain 

Es 22' 
i=l CKj 

where Cj — {hj, h). Thus, we have a closed-expression for the updated estimate of the impulse 
response coefficients h in terms of the previous estimate h. In the event that the algorithm 

converges to a fixed point, we can replace h by h, thus proving the theorem. 
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